85 research outputs found

    The Weight Function in the Subtree Kernel is Decisive

    Get PDF
    Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small datasets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.Comment: 36 page

    Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes

    Get PDF
    A piecewise-deterministic Markov process is a stochastic process whose behavior is governed by an ordinary differential equation punctuated by random jumps occurring at random times. We focus on the nonparametric estimation problem of the jump rate for such a stochastic model observed within a long time interval under an ergodicity condition. We introduce an uncountable class (indexed by the deterministic flow) of recursive kernel estimates of the jump rate and we establish their strong pointwise consistency as well as their asymptotic normality. We propose to choose among this class the estimator with the minimal variance, which is unfortunately unknown and thus remains to be estimated. We also discuss the choice of the bandwidth parameters by cross-validation methods.Comment: 36 pages, 18 figure

    Integral estimation based on Markovian design

    Get PDF
    Suppose that a mobile sensor describes a Markovian trajectory in the ambient space. At each time the sensor measures an attribute of interest, e.g., the temperature. Using only the location history of the sensor and the associated measurements, the aim is to estimate the average value of the attribute over the space. In contrast to classical probabilistic integration methods, e.g., Monte Carlo, the proposed approach does not require any knowledge on the distribution of the sensor trajectory. Probabilistic bounds on the convergence rates of the estimator are established. These rates are better than the traditional "root n"-rate, where n is the sample size, attached to other probabilistic integration methods. For finite sample sizes, the good behaviour of the procedure is demonstrated through simulations and an application to the evaluation of the average temperature of oceans is considered.Comment: 45 page

    Nonparametric estimation of the conditional distribution of the inter-jumping times for piecewise-deterministic Markov processes

    Full text link
    This paper presents a nonparametric method for estimating the conditional density associated to the jump rate of a piecewise-deterministic Markov process. In our framework, the estimation needs only one observation of the process within a long time interval. Our method relies on a generalization of Aalen's multiplicative intensity model. We prove the uniform consistency of our estimator, under some reasonable assumptions related to the primitive characteristics of the process. A simulation example illustrates the behavior of our estimator

    Detection of Common Subtrees with Identical Label Distribution

    Full text link
    Frequent pattern mining is a relevant method to analyse structured data, like sequences, trees or graphs. It consists in identifying characteristic substructures of a dataset. This paper deals with a new type of patterns for tree data: common subtrees with identical label distribution. Their detection is far from obvious since the underlying isomorphism problem is graph isomorphism complete. An elaborated search algorithm is developed and analysed from both theoretical and numerical perspectives. Based on this, the enumeration of patterns is performed through a new lossless compression scheme for trees, called DAG-RW, whose complexity is investigated as well. The method shows very good properties, both in terms of computation times and analysis of real datasets from the literature. Compared to other substructures like topological subtrees and labelled subtrees for which the isomorphism problem is linear, the patterns found provide a more parsimonious representation of the data.Comment: 40 page

    The Weight Function in the Subtree Kernel is Decisive

    Get PDF
    Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficul per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through two real data classification problems the great efficiency of our approach, in particular with respect to the ones considered in the literature, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.Comment: 28 page

    Estimation of Piecewise-Deterministic Trajectories in a Quantum Optics Scenario

    Get PDF
    International audienceThe manipulation of individual copies of quantum systems is one of the most groundbreaking experimental discoveries in the field of quantum physics. On both an experimental and a theoretical level, it has been shown that the dynamics of a single copy of an open quantum system is a trajectory of a piecewise-deterministic process. To the best of our knowledge, this application field has not been explored by the literature in applied mathematics, from both probabilistic and statistical perspectives. The objective of this chapter is to provide a self-contained presentation of this kind of model, as well as its specificities in terms of observations scheme of the system, and a first attempt to deal with a statistical issue that arises in the quantum world

    Estimation non paramétrique optimale du taux de saut d'un processus markovien déterministe par morceaux

    Get PDF
    International audienceUn processus markovien déterministe par morceaux est un processus stochastique dont la trajectoire est décrite par une équation différentielle perturbée par des sauts aléatoires en des instants aléatoires. Nous nous intéressons à l'estimation du taux de saut d'un tel processus observé en temps long sous une hypothèse d'ergodicité. Nous introduisons une classe d'estimateurs non paramétriques consistants et asymptotiquement gaussiens. Nous proposons de choisir l'estimateur de variance minimale, variance qui est elle-même à estimer

    Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes

    Get PDF
    International audienceA piecewise-deterministic Markov process is a stochastic process whose behavior is governed by an ordinary differential equation punctuated by random jumps occurring at random times. We focus on the nonparametric estimation problem of the jump rate for such a stochastic model observed within a long time interval under an ergodicity condition. We introduce an uncountable class (indexed by the deterministic flow) of recursive kernel estimates of the jump rate and we establish their strong pointwise consistency as well as their asymptotic normality. We propose to choose among this class the estimator with the minimal variance, which is unfortunately unknown and thus remains to be estimated. We also discuss the choice of the bandwidth parameters by cross-validation methods
    corecore